Algorithms for Finding Multivariate Discriminant Rules for Classification and Regression Trees

نویسنده

  • Yasuhiko Morimoto
چکیده

Progress in technologies for data input, such as POS (Point Of Sales) systems, and technologies for data storage, such as high density magnetic or optical recording devices, have made it easier for enterprises to collect massive amounts of data and to store them on hard disk at a very low cost. From the early 90’s, many enterprises have been interested in extracting previously unnoticed information that inspires new marketing strategies from these huge databases. Technologies for extracting such information, or knowledge, from huge databases are called “data mining.” Data mining covers technologies for association analysis, classification and regression, cluster analysis, and evolution analysis. Most of these have been widely studied in the field of databases, statistics, and machine learning. Data mining, in general, is focusing on efficiency so that we can handle emerging huge databases whose size is too large to be processed by the conventional techniques. Among these technologies, the author focused on the association analysis and the classification and regression in this dissertation. The author considered association rules on numerical attributes while conventional data mining can only effective for categorical attributes. The accomplishment significantly expanded applications of the association analysis. Then, the author explored classification and regression problems. By utilizing techniques developed for the numerical association rules, the author proposed accurate and comprehensive classification and regression trees. Among these accomplishments, primary contributions of the author are the works on the classification and regression problems. In general, huge databases often contain many attributes and there are many correlations among attributes. However, conventional data mining techniques cannot handle correlations well. In the statistics literature, multivariate analysis methods have been used to handle correlations in numerical databases. Many statistical methods, such as “principal component analysis,” “factor analysis,” and so forth, are categorized as multivariate analysis. Most of the methods in the multivariate analysis assume a linear correlation. Such conventional techniques are effective for data that have linear correlations. However, data contain various types of correlations that cannot be handled by the conventional methods. In order to handle various types of correlations, the author proposed multivariate optimized discriminant rules that can be defined on more than one attribute and presented efficient algorithms for finding the rules. The algorithms efficiently find multivariate discriminant rules

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting The Type of Malaria Using Classification and Regression Decision Trees

Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...

متن کامل

کاربرد الگوریتم‌های داده‌کاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد

Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...

متن کامل

Comparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images

Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...

متن کامل

ارائه مدلی برای پیش‌بینی نوع صافی همودیالیز با تکنیک‌های داده‌کاوی

Introduction: Inadequate dialysis for patients' kidneys as a mortality risk necessitates the presence of a pattern to assist staff in dialysate part to provide the proper services for dialysis patients and also the proper management of their treatment. Since the role of buffer type in the adequacy of dialysis is determinative, the present study is aimed at determining hemodialysis buffer type. ...

متن کامل

Linear and Nonlinear Multivariate Classification of Iranian Bottled Mineral Waters According to Their Elemental Content Determined by ICP-OES

The combinations of inductively coupled plasma-optical emission spectrometry (ICP-OES) and three classification algorithms, i.e., partial least squares discriminant analysis (PLS-DA), least squares support vector machine (LS-SVM) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of Iranian bottled mineral waters, were explored. ICP-OES was used for th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002